chore(profiles): sync GPU profiles with pcie_topology/NUMA from upstream (RUN-40173) by eliranw · Pull Request #212 · run-ai/fake-gpu-operator

eliranw · 2026-06-04T14:16:54Z

What

Re-syncs the built-in GPU profiles via hack/sync-profiles.sh from upstream NVIDIA/k8s-test-infra main (synced commit 497fa04), and improves the sync tooling.

The previous pin was the v0.1.0 tag, which predates upstream's pcie_topology block (PCI root complexes + per-device numa_node). Without it, the mock backend's device-plugin can't report GPU→NUMA, so NodeResourceTopology zones never get GPU resources and NUMA/topology-aware scheduling can't be exercised on mock GPUs. pcie_topology currently exists only on upstream main (no tagged release yet).

Changes

hack/sync-profiles.sh:
- DEFAULT_VERSION now tracks main (the committed builtin.yaml is the pinned artifact; each sync produces a reviewable PR diff). Override with a tag/commit once upstream cuts a release.
- Accepts a tag, branch, OR commit SHA: blobless+sparse clone then git checkout <ref> (the old git clone --branch only accepted tags/branches, though the workflow advertises "tag or commit").
- The generated header records the exact resolved commit for provenance: # Source: NVIDIA/k8s-test-infra main (commit 497fa04).
builtin.yaml (regenerated): all 7 profiles now carry pcie_topology (a100/b200/gb200/gb300/h100/l40s/t4); gb300 is new.
.github/workflows/sync-gpu-profiles.yml: fix version extraction (head -2 → head -3) — the # Source: line is line 3, so it previously captured nothing.
CHANGELOG.md: entry under [Unreleased] → Changed.

Verification

hack/sync-profiles.sh (default → main) ran clean → 7 profiles, header pinned to 497fa04.
builtin.yaml parses as valid YAML (7 ConfigMap docs); each profile contains pcie_topology.

Context

Found while testing the mock backend on a real-NUMA EKS node: the real NFD topology-updater produces a populated NRT (CPU zone cap=8 alloc=7), but GPUs don't appear in the zones because the rendered mock profile had no pcie_topology. This sync is the prerequisite for mock-backend GPU↔NUMA.

RUN-40173

…eam (RUN-40173) Re-sync built-in GPU profiles from upstream NVIDIA/k8s-test-infra main (commit 497fa04), which adds the pcie_topology block (PCI root complexes + per-device numa_node) plus the gb300 profile -- what the mock backend needs to surface GPU->NUMA for NodeResourceTopology / topology-aware scheduling. hack/sync-profiles.sh now defaults to tracking main and accepts a tag, branch, OR commit SHA (blobless clone + checkout instead of clone --branch); the generated header records the exact synced commit for provenance. Also fix the sync-gpu-profiles workflow version extraction (head -2 -> head -3) so it captures the Source line. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

eliranw requested a review from a team as a code owner June 4, 2026 14:16

eliranw force-pushed the eliranw/RUN-40173-sync-gpu-profiles-numa branch from 4565e19 to b9f3198 Compare June 4, 2026 14:43

eliranw force-pushed the eliranw/RUN-40173-sync-gpu-profiles-numa branch from b9f3198 to 7a30881 Compare June 4, 2026 14:46

iris-shain-runai approved these changes Jun 4, 2026

View reviewed changes

eliranw merged commit b881831 into main Jun 4, 2026
11 checks passed

eliranw deleted the eliranw/RUN-40173-sync-gpu-profiles-numa branch June 4, 2026 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(profiles): sync GPU profiles with pcie_topology/NUMA from upstream (RUN-40173)#212

chore(profiles): sync GPU profiles with pcie_topology/NUMA from upstream (RUN-40173)#212
eliranw merged 1 commit into
mainfrom
eliranw/RUN-40173-sync-gpu-profiles-numa

eliranw commented Jun 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eliranw commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes

Verification

Context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eliranw commented Jun 4, 2026 •

edited

Loading